A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# this will help in making the Python code more structured automatically (help adhere to good coding practices)
%load_ext nb_black
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# setting the precision of floating numbers to 5 decimal points
pd.set_option("display.float_format", lambda x: "%.5f" % x)
# statistical libraries for Python
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# prediction libraries for Python
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
precision_recall_curve,
confusion_matrix,
plot_confusion_matrix,
make_scorer,
roc_auc_score,
roc_curve,
)
data = pd.read_csv(
"C:\\Users\\Tayo Adeyo\\Downloads\\Module4_week2\\INNHotelsGroup.csv"
)
# creating a copy of the data so as not to make changes to the original data.
df = data.copy()
# Displaying the first few rows of the dataset
df.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00000 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68000 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00000 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00000 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50000 | 0 | Canceled |
# Checking the shape of the dataset
df.shape
(36275, 19)
# Checking the data types of the columns for the dataset
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
# Checking for duplicate entries
df.duplicated().value_counts()
False 36275 dtype: int64
# Checking for missing entries in the dataset
df.isnull().sum()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# Statistical summary of the numerical columns of the dataset
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 36275.00000 | 1.84496 | 0.51871 | 0.00000 | 2.00000 | 2.00000 | 2.00000 | 4.00000 |
| no_of_children | 36275.00000 | 0.10528 | 0.40265 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 10.00000 |
| no_of_weekend_nights | 36275.00000 | 0.81072 | 0.87064 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 7.00000 |
| no_of_week_nights | 36275.00000 | 2.20430 | 1.41090 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 17.00000 |
| required_car_parking_space | 36275.00000 | 0.03099 | 0.17328 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| lead_time | 36275.00000 | 85.23256 | 85.93082 | 0.00000 | 17.00000 | 57.00000 | 126.00000 | 443.00000 |
| arrival_year | 36275.00000 | 2017.82043 | 0.38384 | 2017.00000 | 2018.00000 | 2018.00000 | 2018.00000 | 2018.00000 |
| arrival_month | 36275.00000 | 7.42365 | 3.06989 | 1.00000 | 5.00000 | 8.00000 | 10.00000 | 12.00000 |
| arrival_date | 36275.00000 | 15.59700 | 8.74045 | 1.00000 | 8.00000 | 16.00000 | 23.00000 | 31.00000 |
| repeated_guest | 36275.00000 | 0.02564 | 0.15805 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 |
| no_of_previous_cancellations | 36275.00000 | 0.02335 | 0.36833 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 13.00000 |
| no_of_previous_bookings_not_canceled | 36275.00000 | 0.15341 | 1.75417 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 58.00000 |
| avg_price_per_room | 36275.00000 | 103.42354 | 35.08942 | 0.00000 | 80.30000 | 99.45000 | 120.00000 | 540.00000 |
| no_of_special_requests | 36275.00000 | 0.61966 | 0.78624 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 5.00000 |
# Statistical summary of the categorical columns of the dataset
df.describe(include="object").T
| count | unique | top | freq | |
|---|---|---|---|---|
| Booking_ID | 36275 | 36275 | INN00001 | 1 |
| type_of_meal_plan | 36275 | 4 | Meal Plan 1 | 27835 |
| room_type_reserved | 36275 | 7 | Room_Type 1 | 28130 |
| market_segment_type | 36275 | 5 | Online | 23214 |
| booking_status | 36275 | 2 | Not_Canceled | 24390 |
# The Booking_ID column contains all unique identifiers, We will drop it from the data set.
df = df.drop("Booking_ID", axis=1)
We need to change the data type of "required_car_parking_space" and "repeated_guest" from integer to object
df["required_car_parking_space"] = df["required_car_parking_space"].apply(
lambda x: "Yes" if x == 1 else "No"
)
df["repeated_guest"] = df["repeated_guest"].apply(lambda x: "Yes" if x == 1 else "No")
# Displaying a few random rows of the dataset
df.sample(n=10, random_state=1)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30392 | 1 | 0 | 1 | 0 | Not Selected | No | Room_Type 1 | 53 | 2018 | 9 | 11 | Online | No | 0 | 0 | 94.32000 | 0 | Not_Canceled |
| 6685 | 2 | 0 | 1 | 2 | Meal Plan 1 | No | Room_Type 1 | 63 | 2018 | 4 | 22 | Online | No | 0 | 0 | 105.30000 | 1 | Canceled |
| 8369 | 2 | 0 | 2 | 3 | Meal Plan 1 | No | Room_Type 4 | 55 | 2018 | 9 | 11 | Online | No | 0 | 0 | 106.24000 | 0 | Not_Canceled |
| 2055 | 2 | 0 | 0 | 2 | Not Selected | No | Room_Type 1 | 53 | 2017 | 12 | 29 | Online | No | 0 | 0 | 81.00000 | 1 | Not_Canceled |
| 10969 | 1 | 0 | 2 | 4 | Meal Plan 1 | No | Room_Type 1 | 245 | 2018 | 7 | 6 | Offline | No | 0 | 0 | 110.00000 | 0 | Canceled |
| 24881 | 2 | 0 | 3 | 7 | Meal Plan 1 | No | Room_Type 2 | 231 | 2018 | 8 | 1 | Online | No | 0 | 0 | 81.82000 | 2 | Canceled |
| 28658 | 2 | 0 | 0 | 3 | Meal Plan 2 | No | Room_Type 1 | 71 | 2018 | 5 | 10 | Offline | No | 0 | 0 | 126.00000 | 1 | Not_Canceled |
| 20853 | 2 | 0 | 1 | 2 | Meal Plan 1 | No | Room_Type 1 | 66 | 2017 | 10 | 9 | Offline | No | 0 | 0 | 75.00000 | 0 | Canceled |
| 8501 | 2 | 0 | 0 | 3 | Meal Plan 1 | No | Room_Type 2 | 40 | 2018 | 1 | 14 | Online | No | 0 | 0 | 77.55000 | 1 | Not_Canceled |
| 1942 | 2 | 0 | 0 | 2 | Meal Plan 1 | No | Room_Type 1 | 63 | 2018 | 8 | 9 | Online | No | 0 | 0 | 144.90000 | 2 | Not_Canceled |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 no_of_weekend_nights 36275 non-null int64 3 no_of_week_nights 36275 non-null int64 4 type_of_meal_plan 36275 non-null object 5 required_car_parking_space 36275 non-null object 6 room_type_reserved 36275 non-null object 7 lead_time 36275 non-null int64 8 arrival_year 36275 non-null int64 9 arrival_month 36275 non-null int64 10 arrival_date 36275 non-null int64 11 market_segment_type 36275 non-null object 12 repeated_guest 36275 non-null object 13 no_of_previous_cancellations 36275 non-null int64 14 no_of_previous_bookings_not_canceled 36275 non-null int64 15 avg_price_per_room 36275 non-null float64 16 no_of_special_requests 36275 non-null int64 17 booking_status 36275 non-null object dtypes: float64(1), int64(11), object(6) memory usage: 5.0+ MB
df.shape
(36275, 18)
cat_cols = df.select_dtypes(["object"]).columns
cat_cols
Index(['type_of_meal_plan', 'required_car_parking_space', 'room_type_reserved',
'market_segment_type', 'repeated_guest', 'booking_status'],
dtype='object')
# Display the unique values in each of the categorical data types
for col in cat_cols:
print(df[col].value_counts())
print("=" * 40, "\n")
Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: type_of_meal_plan, dtype: int64 ======================================== No 35151 Yes 1124 Name: required_car_parking_space, dtype: int64 ======================================== Room_Type 1 28130 Room_Type 4 6057 Room_Type 6 966 Room_Type 2 692 Room_Type 5 265 Room_Type 7 158 Room_Type 3 7 Name: room_type_reserved, dtype: int64 ======================================== Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64 ======================================== No 35345 Yes 930 Name: repeated_guest, dtype: int64 ======================================== Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64 ========================================
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(15, 8), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,8))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 5))
else:
plt.figure(figsize=(n + 2, 5))
plt.xticks(rotation=90, fontsize=12)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
histogram_boxplot(df, "arrival_month")
labeled_barplot(df, "arrival_month", perc=True, n=None)
The busiest months in the hotel is October with 14.7% of the total bookings, followed by September and August with 12.7% and 10.5% of the total bookings respectively.
df.market_segment_type.value_counts()
Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64
labeled_barplot(df, "market_segment_type", perc=True, n=None)
Online market segment make up 64% of the guests, followed by the offline market segment which is responsible for another 29%
labeled_barplot(df, "booking_status", perc=True, n=None)
32.8% of bookings are cancelled
histogram_boxplot(df, "no_of_adults")
histogram_boxplot(df, "no_of_children")
histogram_boxplot(df, "no_of_weekend_nights")
histogram_boxplot(df, "no_of_week_nights")
histogram_boxplot(df, "lead_time")
histogram_boxplot(df, "no_of_previous_bookings_not_canceled")
histogram_boxplot(df, "no_of_previous_cancellations")
histogram_boxplot(df, "avg_price_per_room")
histogram_boxplot(df, "no_of_special_requests")
labeled_barplot(df, "no_of_special_requests", perc=True, n=None)
labeled_barplot(df, "required_car_parking_space", perc=True, n=None)
labeled_barplot(df, "repeated_guest", perc=True, n=None)
labeled_barplot(df, "type_of_meal_plan", perc=True, n=None)
labeled_barplot(df, "repeated_guest", perc=True, n=None)
labeled_barplot(df, "room_type_reserved", perc=True, n=None)
labeled_barplot(df, "no_of_adults", perc=True, n=None)
labeled_barplot(df, "no_of_children", perc=True, n=None)
labeled_barplot(df, "no_of_weekend_nights", perc=True, n=None)
labeled_barplot(df, "no_of_week_nights", perc=True, n=None)
labeled_barplot(df, "arrival_year", perc=True, n=None)
df["booking_status"] = df["booking_status"].apply(lambda x: 1 if x == "Canceled" else 0)
col = df.select_dtypes([np.number]).columns
col
Index(['no_of_adults', 'no_of_children', 'no_of_weekend_nights',
'no_of_week_nights', 'lead_time', 'arrival_year', 'arrival_month',
'arrival_date', 'no_of_previous_cancellations',
'no_of_previous_bookings_not_canceled', 'avg_price_per_room',
'no_of_special_requests', 'booking_status'],
dtype='object')
plt.figure(figsize=(12, 7))
sns.heatmap(df[col].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
df.groupby("market_segment_type")["avg_price_per_room"].median()
market_segment_type Aviation 95.00000 Complementary 0.00000 Corporate 79.00000 Offline 90.00000 Online 107.10000 Name: avg_price_per_room, dtype: float64
plt.figure(figsize=(10, 5))
sns.boxplot(
data=df, x="market_segment_type", y="avg_price_per_room",
)
<AxesSubplot:xlabel='market_segment_type', ylabel='avg_price_per_room'>
plt.figure(figsize=(10, 5))
sns.lineplot(data=data, x="market_segment_type", y="avg_price_per_room")
plt.show()
Online bookings have the highest avg_price_per_room, followed by Aviation, Offline and Corporate with complimentary bringing up the rear
# Predefined "labelled_barplot" function modified to include the "hue" parameter
def labeled_barplot_mod(data, feature, hue, perc=False, n=None):
"""
Barplot with percentage at the top, with hue parameter included
data: dataframe
feature: dataframe column
hue: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 5))
else:
plt.figure(figsize=(n + 2, 5))
plt.xticks(rotation=45, fontsize=12)
ax = sns.countplot(
data=data,
x=feature,
hue=hue,
palette="CMRmap",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
labeled_barplot_mod(df, "repeated_guest", hue="booking_status", perc=True, n=None)
labeled_barplot_mod(df, "repeated_guest", hue=None, perc=True, n=None)
df.groupby("repeated_guest")["booking_status"].sum()
repeated_guest No 11869 Yes 16 Name: booking_status, dtype: int64
df.groupby("repeated_guest")["booking_status"].value_counts(normalize=True)
repeated_guest booking_status
No 0 0.66420
1 0.33580
Yes 0 0.98280
1 0.01720
Name: booking_status, dtype: float64
df.groupby("repeated_guest")["booking_status"].value_counts()
repeated_guest booking_status
No 0 23476
1 11869
Yes 0 914
1 16
Name: booking_status, dtype: int64
Repeating guests make up 2.6% of the total bookings (930 out of 36275). Out of this 930 bookings, 1.72%, that is 16 out of 930 cancel their bookings, wheras 33.58%, that is 11869 out of 35345 of bookins that were made by new guests are cancelled.
labeled_barplot_mod(df, "no_of_special_requests", hue="booking_status", perc=True, n=10)
df.groupby("no_of_special_requests")["booking_status"].value_counts()
no_of_special_requests booking_status
0 0 11232
1 8545
1 0 8670
1 2703
2 0 3727
1 637
3 0 675
4 0 78
5 0 8
Name: booking_status, dtype: int64
df.groupby("no_of_special_requests")["booking_status"].value_counts(normalize=True)
no_of_special_requests booking_status
0 0 0.56793
1 0.43207
1 0 0.76233
1 0.23767
2 0 0.85403
1 0.14597
3 0 1.00000
4 0 1.00000
5 0 1.00000
Name: booking_status, dtype: float64
There is a 43% chance of bookings being cancelled when guests don't make any special request. Bookings made by guests making one special request has a 23.77% chance of cancellation and bookings made by quests with 2 special requests has a 14.6% chance of cancellation. Guests making 3 or more special requests are not likely to cancel their bookings.
labeled_barplot_mod(df, "arrival_year", hue="booking_status", perc=True, n=None)
labeled_barplot_mod(df, "arrival_month", hue="booking_status", perc=True, n=18)
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x="no_of_week_nights", hue="booking_status")
plt.show()
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x="no_of_weekend_nights", hue="booking_status")
plt.show()
labeled_barplot_mod(df, "market_segment_type", hue="booking_status", perc=True, n=10)
labeled_barplot_mod(df, "room_type_reserved", hue="booking_status", perc=True, n=10)
plt.figure(figsize=(12, 5))
sns.countplot(data=df, x="room_type_reserved", hue="booking_status")
<AxesSubplot:xlabel='room_type_reserved', ylabel='count'>
labeled_barplot_mod(df, "type_of_meal_plan", hue="booking_status", perc=True, n=10)
df.market_segment_type.value_counts()
Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64
df.type_of_meal_plan.value_counts()
Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: type_of_meal_plan, dtype: int64
Most guests are only interested in meal Plan 1 (breakfast) when placing their bookings
plt.figure(figsize=(12, 7))
sns.countplot(data=df, x="market_segment_type", hue="type_of_meal_plan")
<AxesSubplot:xlabel='market_segment_type', ylabel='count'>
plt.figure(figsize=(10, 5))
sns.countplot(data=data, x="arrival_month")
plt.show()
plt.figure(figsize=(10, 5))
sns.lineplot(data=data, x="arrival_month", y="avg_price_per_room")
plt.show()
Avg_price_per_room seems to follow the same trend as the arrival_month. There is a gradual increase in price between the month of January and May. Prices from this period is stable, albeit with a slight increase towards mid September, then there is a gradual decrease in prices between late September and December.
num_columns = df.select_dtypes(include=["float64", "int64"]).columns.tolist()
num_columns
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'booking_status']
num_columns.remove("booking_status")
plt.figure(figsize=(15, 20))
for j, column in enumerate(num_columns):
plt.subplot(5, 4, j + 1)
sns.boxplot(data=df, x=column)
plt.tight_layout()
plt.show()
There are outliers in all the columns plotted except the arrival month, and arrival date columns. But since these outliers are actually real values and dropping them is bound to lead to loss of important information, I choose to leave them as they are.
# Identifying the categorical variables
cat_columns = df.select_dtypes(include=["object"]).columns.tolist()
cat_columns
['type_of_meal_plan', 'required_car_parking_space', 'room_type_reserved', 'market_segment_type', 'repeated_guest']
# defining X and y variables
X = df.drop(["booking_status"], axis=1)
Y = df["booking_status"]
# Adding a constant to the X variables
X = sm.add_constant(X)
# creating dummies for X from the categorical variables
X = pd.get_dummies(X, cat_columns, drop_first=True)
# Splitting data to train and test sets in the ratio 70:30
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
X_train.shape
(25392, 28)
X_test.shape
(10883, 28)
y_train.shape
(25392,)
y_train.value_counts()
0 17029 1 8363 Name: booking_status, dtype: int64
y_test.value_counts()
0 7361 1 3522 Name: booking_status, dtype: int64
# function to check VIF.
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))
]
return vif
# checking VIF on train data
checking_vif(X_train)
| feature | VIF | |
|---|---|---|
| 0 | const | 39468156.70600 |
| 1 | no_of_adults | 1.34815 |
| 2 | no_of_children | 1.97823 |
| 3 | no_of_weekend_nights | 1.06948 |
| 4 | no_of_week_nights | 1.09567 |
| 5 | lead_time | 1.39491 |
| 6 | arrival_year | 1.43083 |
| 7 | arrival_month | 1.27567 |
| 8 | arrival_date | 1.00674 |
| 9 | no_of_previous_cancellations | 1.39569 |
| 10 | no_of_previous_bookings_not_canceled | 1.65199 |
| 11 | avg_price_per_room | 2.05042 |
| 12 | no_of_special_requests | 1.24728 |
| 13 | type_of_meal_plan_Meal Plan 2 | 1.27185 |
| 14 | type_of_meal_plan_Meal Plan 3 | 1.02522 |
| 15 | type_of_meal_plan_Not Selected | 1.27218 |
| 16 | required_car_parking_space_Yes | 1.03993 |
| 17 | room_type_reserved_Room_Type 2 | 1.10144 |
| 18 | room_type_reserved_Room_Type 3 | 1.00330 |
| 19 | room_type_reserved_Room_Type 4 | 1.36152 |
| 20 | room_type_reserved_Room_Type 5 | 1.02781 |
| 21 | room_type_reserved_Room_Type 6 | 1.97307 |
| 22 | room_type_reserved_Room_Type 7 | 1.11512 |
| 23 | market_segment_type_Complementary | 4.50011 |
| 24 | market_segment_type_Corporate | 16.92844 |
| 25 | market_segment_type_Offline | 64.11392 |
| 26 | market_segment_type_Online | 71.17643 |
| 27 | repeated_guest_Yes | 1.78352 |
col_to_drop = "market_segment_type_Online"
X_train1 = X_train.loc[:, ~X_train.columns.str.startswith(col_to_drop)]
X_test1 = X_test.loc[:, ~X_test.columns.str.startswith(col_to_drop)]
X_train1.shape
(25392, 27)
# checking VIF on train data
checking_vif(X_train1)
| feature | VIF | |
|---|---|---|
| 0 | const | 39391371.31459 |
| 1 | no_of_adults | 1.33178 |
| 2 | no_of_children | 1.97735 |
| 3 | no_of_weekend_nights | 1.06904 |
| 4 | no_of_week_nights | 1.09512 |
| 5 | lead_time | 1.39064 |
| 6 | arrival_year | 1.42838 |
| 7 | arrival_month | 1.27463 |
| 8 | arrival_date | 1.00672 |
| 9 | no_of_previous_cancellations | 1.39545 |
| 10 | no_of_previous_bookings_not_canceled | 1.65175 |
| 11 | avg_price_per_room | 2.04959 |
| 12 | no_of_special_requests | 1.24242 |
| 13 | type_of_meal_plan_Meal Plan 2 | 1.27150 |
| 14 | type_of_meal_plan_Meal Plan 3 | 1.02522 |
| 15 | type_of_meal_plan_Not Selected | 1.27039 |
| 16 | required_car_parking_space_Yes | 1.03979 |
| 17 | room_type_reserved_Room_Type 2 | 1.10127 |
| 18 | room_type_reserved_Room_Type 3 | 1.00330 |
| 19 | room_type_reserved_Room_Type 4 | 1.35600 |
| 20 | room_type_reserved_Room_Type 5 | 1.02781 |
| 21 | room_type_reserved_Room_Type 6 | 1.97273 |
| 22 | room_type_reserved_Room_Type 7 | 1.11500 |
| 23 | market_segment_type_Complementary | 1.33825 |
| 24 | market_segment_type_Corporate | 1.52777 |
| 25 | market_segment_type_Offline | 1.59742 |
| 26 | repeated_guest_Yes | 1.78019 |
No multicolinearity between the X variables after dropping the "market_segment_type_Online" column
In the case at hand, we want to minimise false positives and false negatives as much as posible, because:
For the two reasons above, we will choose the F1 score as our metric of performance measurement. Maximising the F1 score will minimise both false positives and false negatives.
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# Building the logistic regression model and printing the summary
logit1 = sm.Logit(y_train, X_train1.astype(float))
lg1 = logit1.fit()
print(lg1.summary())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.425084
Iterations: 35
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25365
Method: MLE Df Model: 26
Date: Fri, 17 Feb 2023 Pseudo R-squ.: 0.3292
Time: 13:13:22 Log-Likelihood: -10794.
converged: False LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -933.3324 120.655 -7.736 0.000 -1169.813 -696.852
no_of_adults 0.1060 0.037 2.841 0.004 0.033 0.179
no_of_children 0.1542 0.057 2.694 0.007 0.042 0.266
no_of_weekend_nights 0.1075 0.020 5.439 0.000 0.069 0.146
no_of_week_nights 0.0405 0.012 3.295 0.001 0.016 0.065
lead_time 0.0157 0.000 58.933 0.000 0.015 0.016
arrival_year 0.4611 0.060 7.711 0.000 0.344 0.578
arrival_month -0.0411 0.006 -6.358 0.000 -0.054 -0.028
arrival_date 0.0005 0.002 0.257 0.797 -0.003 0.004
no_of_previous_cancellations 0.2633 0.086 3.074 0.002 0.095 0.431
no_of_previous_bookings_not_canceled -0.1728 0.152 -1.136 0.256 -0.471 0.125
avg_price_per_room 0.0187 0.001 25.374 0.000 0.017 0.020
no_of_special_requests -1.4709 0.030 -48.891 0.000 -1.530 -1.412
type_of_meal_plan_Meal Plan 2 0.1794 0.067 2.694 0.007 0.049 0.310
type_of_meal_plan_Meal Plan 3 19.8256 1.36e+04 0.001 0.999 -2.67e+04 2.67e+04
type_of_meal_plan_Not Selected 0.2745 0.053 5.181 0.000 0.171 0.378
required_car_parking_space_Yes -1.5907 0.138 -11.538 0.000 -1.861 -1.320
room_type_reserved_Room_Type 2 -0.3640 0.131 -2.784 0.005 -0.620 -0.108
room_type_reserved_Room_Type 3 -0.0018 1.310 -0.001 0.999 -2.569 2.566
room_type_reserved_Room_Type 4 -0.2763 0.053 -5.207 0.000 -0.380 -0.172
room_type_reserved_Room_Type 5 -0.7182 0.209 -3.436 0.001 -1.128 -0.308
room_type_reserved_Room_Type 6 -0.9408 0.147 -6.402 0.000 -1.229 -0.653
room_type_reserved_Room_Type 7 -1.3891 0.293 -4.743 0.000 -1.963 -0.815
market_segment_type_Complementary -47.7454 7.09e+06 -6.74e-06 1.000 -1.39e+07 1.39e+07
market_segment_type_Corporate -0.8033 0.103 -7.807 0.000 -1.005 -0.602
market_segment_type_Offline -1.7995 0.052 -34.577 0.000 -1.902 -1.698
repeated_guest_Yes -2.3140 0.618 -3.743 0.000 -3.526 -1.102
========================================================================================================
# Print the training performance
model_performance_classification_statsmodels(lg1, X_train1, y_train)
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80577 | 0.63374 | 0.73929 | 0.68246 |
Negative values of the coefficient shows that the probability of a booking being cancelled decreases with the increase of corresponding attribute value.
Positive values of the coefficient shows that the probability of a booking being cancelled increases with the increase of corresponding attribute value.
p-value of a variable indicates if the variable is significant or not. If we consider the significance level to be 0.05 (5%), then any variable with a p-value less than 0.05 would be considered significant.
We observe that quite a few of our predictor variables have p_values greater than 0.05, therefore these variables do not significantly impact the target variable.
We will therefore proceed to drop them, one after the other. The variable with the highest p_value will be dropped first.
# initial list of columns
cols = X_train1.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
x_train_aux = X_train1[cols]
# fitting the model
model = sm.Logit(y_train, x_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'lead_time', 'arrival_year', 'arrival_month', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'required_car_parking_space_Yes', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'repeated_guest_Yes']
X_train2 = X_train1[selected_features]
X_test2 = X_test1[selected_features]
# Building the new logistic regression model and printing its summary
logit2 = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit2.fit()
print(lg2.summary())
Optimization terminated successfully.
Current function value: 0.425677
Iterations 11
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25370
Method: MLE Df Model: 21
Date: Fri, 17 Feb 2023 Pseudo R-squ.: 0.3283
Time: 13:13:23 Log-Likelihood: -10809.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -917.2860 120.456 -7.615 0.000 -1153.376 -681.196
no_of_adults 0.1086 0.037 2.914 0.004 0.036 0.182
no_of_children 0.1522 0.057 2.660 0.008 0.040 0.264
no_of_weekend_nights 0.1086 0.020 5.501 0.000 0.070 0.147
no_of_week_nights 0.0418 0.012 3.403 0.001 0.018 0.066
lead_time 0.0157 0.000 59.218 0.000 0.015 0.016
arrival_year 0.4531 0.060 7.591 0.000 0.336 0.570
arrival_month -0.0424 0.006 -6.568 0.000 -0.055 -0.030
no_of_previous_cancellations 0.2289 0.077 2.983 0.003 0.078 0.379
avg_price_per_room 0.0192 0.001 26.343 0.000 0.018 0.021
no_of_special_requests -1.4699 0.030 -48.892 0.000 -1.529 -1.411
type_of_meal_plan_Meal Plan 2 0.1654 0.067 2.487 0.013 0.035 0.296
type_of_meal_plan_Not Selected 0.2858 0.053 5.405 0.000 0.182 0.389
required_car_parking_space_Yes -1.5943 0.138 -11.561 0.000 -1.865 -1.324
room_type_reserved_Room_Type 2 -0.3560 0.131 -2.725 0.006 -0.612 -0.100
room_type_reserved_Room_Type 4 -0.2826 0.053 -5.330 0.000 -0.387 -0.179
room_type_reserved_Room_Type 5 -0.7352 0.208 -3.529 0.000 -1.143 -0.327
room_type_reserved_Room_Type 6 -0.9650 0.147 -6.572 0.000 -1.253 -0.677
room_type_reserved_Room_Type 7 -1.4312 0.293 -4.892 0.000 -2.005 -0.858
market_segment_type_Corporate -0.7928 0.103 -7.711 0.000 -0.994 -0.591
market_segment_type_Offline -1.7867 0.052 -34.391 0.000 -1.889 -1.685
repeated_guest_Yes -2.7365 0.557 -4.915 0.000 -3.828 -1.645
==================================================================================================
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train2.columns).T
| const | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | no_of_previous_cancellations | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | required_car_parking_space_Yes | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Corporate | market_segment_type_Offline | repeated_guest_Yes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.00000 | 1.11475 | 1.16436 | 1.11475 | 1.04264 | 1.01584 | 1.57324 | 0.95853 | 1.25716 | 1.01935 | 0.22994 | 1.17992 | 1.33089 | 0.20305 | 0.70046 | 0.75383 | 0.47940 | 0.38099 | 0.23903 | 0.45258 | 0.16750 | 0.06480 |
| Change_odd% | -100.00000 | 11.47536 | 16.43601 | 11.47526 | 4.26363 | 1.58352 | 57.32351 | -4.14725 | 25.71567 | 1.93479 | -77.00595 | 17.99156 | 33.08924 | -79.69523 | -29.95389 | -24.61701 | -52.05967 | -61.90093 | -76.09669 | -54.74162 | -83.24963 | -93.52026 |
no_of_adults: keeping all other features constant, a unit change in no_of_adults will increase the odds of a booking being cancelled by 1.115 times or an 11.48% increase in the odds.
no_of_previous_cancellations: keeping all other features constant, a unit change in no_of_previous_cancellations will increase the odds of a booking being cancelled by 1.257 times or an 25.71% increase in the odds.
no_of_special_requests: keeping all other features constant, a unit change in no_of_special_requests will decrease the odds of a booking being cancelled by 0.23 times or a 77% decrease in the odds.
repeated_guest: keeping all other features constant, when a customer is a repeated_guest, the odds of this customer cancelling his/her booking decreases by 0.065 times or a 93.52% decrease in the odds
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train)
# Calculate and Print the training performance
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg2, X_train2, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80541 | 0.63255 | 0.73903 | 0.68166 |
# creating the confusion matrix of the test set
confusion_matrix_statsmodels(lg2, X_test2, y_test)
# Calculate and Print the test performance
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg2, X_test2, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80465 | 0.63089 | 0.72900 | 0.67641 |
logit_roc_auc_train = roc_auc_score(y_train, lg2.predict(X_train2))
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr (True Positive Rate) is high and fpr (False Positive Rateis low
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.3710466623489539
Checking model performance on the training set
# creating confusion matrix
confusion_matrix_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79289 | 0.73562 | 0.66870 | 0.70056 |
Checking model performance on the test set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79601 | 0.73935 | 0.66667 | 0.70113 |
y_scores = lg2.predict(X_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.42
Checking model performance on the training set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80128 | 0.69939 | 0.69789 | 0.69864 |
Checking model performance on the test set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80364 | 0.70386 | 0.69381 | 0.69880 |
# Training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.37 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80541 | 0.79289 | 0.80128 |
| Recall | 0.63255 | 0.73562 | 0.69939 |
| Precision | 0.73903 | 0.66870 | 0.69789 |
| F1 | 0.68166 | 0.70056 | 0.69864 |
# Testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression-default Threshold (0.5)",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression-default Threshold (0.5) | Logistic Regression-0.37 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80465 | 0.79601 | 0.80364 |
| Recall | 0.63089 | 0.73935 | 0.70386 |
| Precision | 0.72900 | 0.66667 | 0.69381 |
| F1 | 0.67641 | 0.70113 | 0.69880 |
df.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | No | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | No | 0 | 0 | 65.00000 | 0 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not Selected | No | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | No | 0 | 0 | 106.68000 | 1 | 0 |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | No | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | No | 0 | 0 | 60.00000 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | No | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | No | 0 | 0 | 100.00000 | 0 | 1 |
| 4 | 2 | 0 | 1 | 1 | Not Selected | No | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | No | 0 | 0 | 94.50000 | 0 | 1 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 no_of_weekend_nights 36275 non-null int64 3 no_of_week_nights 36275 non-null int64 4 type_of_meal_plan 36275 non-null object 5 required_car_parking_space 36275 non-null object 6 room_type_reserved 36275 non-null object 7 lead_time 36275 non-null int64 8 arrival_year 36275 non-null int64 9 arrival_month 36275 non-null int64 10 arrival_date 36275 non-null int64 11 market_segment_type 36275 non-null object 12 repeated_guest 36275 non-null object 13 no_of_previous_cancellations 36275 non-null int64 14 no_of_previous_bookings_not_canceled 36275 non-null int64 15 avg_price_per_room 36275 non-null float64 16 no_of_special_requests 36275 non-null int64 17 booking_status 36275 non-null int64 dtypes: float64(1), int64(12), object(5) memory usage: 5.0+ MB
df.shape
(36275, 18)
X = df.drop(["booking_status"], axis=1)
Y = df["booking_status"]
# create dummies for the categorical variables in X
X = pd.get_dummies(X, cat_columns, drop_first=True)
X.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | lead_time | arrival_year | arrival_month | arrival_date | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | required_car_parking_space_Yes | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | repeated_guest_Yes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 224 | 2017 | 10 | 2 | 0 | 0 | 65.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | 0 | 2 | 3 | 5 | 2018 | 11 | 6 | 0 | 0 | 106.68000 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | 1 | 0 | 2 | 1 | 1 | 2018 | 2 | 28 | 0 | 0 | 60.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 2 | 0 | 0 | 2 | 211 | 2018 | 5 | 20 | 0 | 0 | 100.00000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 2 | 0 | 1 | 1 | 48 | 2018 | 4 | 11 | 0 | 0 | 94.50000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
# Splitting data into train and test sets in the ratio 70:30
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("=" * 40)
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.67064 1 0.32936 Name: booking_status, dtype: float64 ======================================== Percentage of classes in test set: 0 0.67638 1 0.32362 Name: booking_status, dtype: float64
model = DecisionTreeClassifier(criterion="gini", random_state=1)
model.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
As mentioned in the logistic regresection section, our model can make wrong predictions by predicting that:
The best interest of the business will be catered for by keeping false positives and false negatives to a minimum at the same time. We have therefore chosen the F1 score as our metric of performance measurement. Maximising the F1 score will minimise both false positives and false negatives
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# create tghe confusion matrix for train data
confusion_matrix_sklearn(model, X_train, y_train)
# Calculate the different performance metrics for the training set
decision_tree_perf_train = model_performance_classification_sklearn(
model, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99421 | 0.98661 | 0.99578 | 0.99117 |
# create tghe confusion matrix for test data
confusion_matrix_sklearn(model, X_test, y_test)
# Calculate the different performance metrics for the test set
decision_tree_perf_test = model_performance_classification_sklearn(
model, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.87090 | 0.81034 | 0.79476 | 0.80247 |
column_names = list(X.columns)
feature_names = column_names
print(feature_names)
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'required_car_parking_space_Yes', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online', 'repeated_guest_Yes']
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- avg_price_per_room <= 201.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- repeated_guest_Yes <= 0.50 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- repeated_guest_Yes > 0.50 | | | | | | | | | | |--- weights: [147.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 17.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1609.00, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 68.50 | | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- lead_time > 68.50 | | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | | |--- lead_time <= 77.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 77.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- lead_time <= 71.50 | | | | | | | | | | |--- arrival_month <= 8.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_month > 8.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 71.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 123.25 | | | | | | | | | | |--- weights: [0.00, 52.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 123.25 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | |--- avg_price_per_room <= 131.67 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 131.67 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 201.50 | | | | | |--- arrival_date <= 28.00 | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | |--- arrival_date > 28.00 | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- arrival_month <= 3.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- avg_price_per_room <= 90.47 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 90.47 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- arrival_date <= 5.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 5.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | |--- weights: [35.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | |--- avg_price_per_room <= 75.22 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 75.22 | | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 62.25 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 62.25 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | |--- lead_time <= 97.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 97.50 | | | | | | | | | | | |--- weights: [0.00, 39.00] class: 1 | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- lead_time <= 96.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 96.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 82.50 | | | | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- arrival_date > 17.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 82.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [11.00, 2.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 16.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 108.50 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 125.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 125.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 108.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [12.00, 1.00] class: 0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- lead_time <= 113.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 113.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- arrival_date > 16.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- avg_price_per_room <= 127.39 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 50.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 127.39 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 101.34 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 101.34 | | | | | | | | | | |--- avg_price_per_room <= 177.83 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 177.83 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | |--- avg_price_per_room <= 65.38 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- avg_price_per_room > 65.38 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- avg_price_per_room <= 91.33 | | | | | | | | | | | |--- weights: [8.00, 2.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 91.33 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | |--- weights: [13.00, 1.00] class: 0 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [113.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- avg_price_per_room <= 90.85 | | | | | | | | | |--- avg_price_per_room <= 87.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 87.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | |--- avg_price_per_room > 90.85 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- avg_price_per_room <= 216.00 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- lead_time <= 128.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 128.00 | | | | | | | | | | | |--- weights: [75.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 216.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 202.67 | | | | | |--- lead_time <= 3.50 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [56.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 77.50 | | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 77.50 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | |--- avg_price_per_room <= 134.22 | | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 134.22 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 76.35 | | | | | | | | | | |--- avg_price_per_room <= 74.40 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 74.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 76.35 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- avg_price_per_room <= 178.00 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 178.00 | | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 3.50 | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | |--- no_of_week_nights <= 11.00 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- weights: [100.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 77.18 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 77.18 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 11.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.67 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 32.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- lead_time <= 51.50 | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | |--- avg_price_per_room <= 21.67 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 21.67 | | | | | | | | | | |--- avg_price_per_room <= 49.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 49.84 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- lead_time > 51.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | |--- lead_time <= 139.00 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- lead_time > 139.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | |--- lead_time <= 87.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 87.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- lead_time <= 60.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 60.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 28 | | | | | | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | | | | |--- weights: [0.00, 35.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_date <= 9.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 9.00 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- lead_time <= 33.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- lead_time > 33.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 135.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 135.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 92.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | |--- no_of_week_nights <= 8.00 | | | | | | | |--- weights: [39.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 8.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- lead_time <= 102.50 | | | | | | |--- no_of_week_nights <= 11.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [848.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- weights: [43.00, 0.00] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- lead_time <= 95.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- lead_time > 95.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- avg_price_per_room <= 164.79 | | | | | | | | | |--- avg_price_per_room <= 138.55 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 138.55 | | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 164.79 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_week_nights > 11.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 102.50 | | | | | | |--- lead_time <= 104.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 67.65 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 67.65 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- lead_time > 104.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- avg_price_per_room <= 83.39 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 83.39 | | | | | | | | | | |--- lead_time <= 143.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 143.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | | | | |--- weights: [54.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | | | | |--- avg_price_per_room <= 131.75 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 131.75 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- lead_time <= 12.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 12.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_week_nights <= 10.00 | | | | | | | |--- avg_price_per_room <= 219.86 | | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | | |--- weights: [81.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- avg_price_per_room > 219.86 | | | | | | | | |--- avg_price_per_room <= 223.58 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 223.58 | | | | | | | | | |--- avg_price_per_room <= 237.25 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 237.25 | | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_week_nights > 10.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- avg_price_per_room <= 123.60 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 123.60 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | |--- avg_price_per_room <= 128.91 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 128.91 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- weights: [42.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- lead_time <= 43.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [128.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 43.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- avg_price_per_room <= 119.12 | | | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 119.12 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- avg_price_per_room <= 182.49 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 182.49 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | |--- weights: [180.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2126.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [43.00, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [70.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 202.95 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | |--- avg_price_per_room <= 91.62 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 91.62 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | |--- lead_time <= 98.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 98.50 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- lead_time <= 150.50 | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- avg_price_per_room <= 157.65 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 157.65 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- lead_time > 150.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 107.00 | | | | | | | | | | |--- avg_price_per_room <= 70.52 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 70.52 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 107.00 | | | | | | | | | | |--- arrival_date <= 17.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 17.00 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [52.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.95 | | | | | |--- weights: [0.00, 7.00] class: 1 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 88.25 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.25 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [61.00, 6.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- avg_price_per_room <= 70.85 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.85 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- arrival_date > 8.50 | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | | |--- weights: [3.00, 2.00] class: 0 | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [24.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 75.75 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 75.75 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 2.50 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- lead_time <= 205.00 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- lead_time > 205.00 | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- arrival_date > 19.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 2.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 525.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- lead_time <= 263.50 | | | | | | | | |--- avg_price_per_room <= 76.87 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- avg_price_per_room > 76.87 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- lead_time > 263.50 | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [0.00, 58.00] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 23.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 1.50 | | | | | | | | |--- weights: [48.00, 0.00] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | |--- weights: [0.00, 125.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 300.50 | | | | | | | | | |--- lead_time <= 226.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- lead_time > 226.50 | | | | | | | | | | |--- lead_time <= 272.00 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 272.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 300.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- avg_price_per_room <= 96.37 | | | | | | | |--- lead_time <= 356.00 | | | | | | | | |--- lead_time <= 302.50 | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 302.50 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | |--- lead_time > 356.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 96.37 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | |--- lead_time <= 153.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 153.50 | | | | | | | | | | |--- lead_time <= 157.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 157.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | |--- lead_time <= 233.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 233.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | |--- lead_time <= 204.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 204.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 224.50 | | | | | | | | | | |--- lead_time <= 175.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 175.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- lead_time > 224.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | |--- lead_time <= 176.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 176.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | |--- lead_time <= 217.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 217.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_date > 14.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- lead_time <= 264.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 264.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- lead_time <= 281.50 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 281.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- lead_time <= 198.00 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 198.00 | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | |--- arrival_date <= 30.00 | | | | | | | | |--- weights: [137.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.00 | | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 21.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 348.50 | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | | |--- weights: [6.00, 2.00] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 2108.00] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [31.00, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [47.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- lead_time <= 172.50 | | | | | | |--- arrival_date <= 28.00 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- arrival_date > 28.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 172.50 | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1
# Printing the Gini importance of the predictor variables
print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.35225 avg_price_per_room 0.17530 market_segment_type_Online 0.09247 arrival_date 0.08556 no_of_special_requests 0.06797 arrival_month 0.06395 no_of_week_nights 0.04344 no_of_weekend_nights 0.04112 no_of_adults 0.02656 arrival_year 0.01192 type_of_meal_plan_Not Selected 0.00757 room_type_reserved_Room_Type 4 0.00719 required_car_parking_space_Yes 0.00714 type_of_meal_plan_Meal Plan 2 0.00391 no_of_children 0.00372 market_segment_type_Offline 0.00360 room_type_reserved_Room_Type 2 0.00191 room_type_reserved_Room_Type 5 0.00179 room_type_reserved_Room_Type 6 0.00077 market_segment_type_Corporate 0.00071 repeated_guest_Yes 0.00051 no_of_previous_bookings_not_canceled 0.00040 room_type_reserved_Room_Type 7 0.00026 market_segment_type_Complementary 0.00000 room_type_reserved_Room_Type 3 0.00000 no_of_previous_cancellations 0.00000 type_of_meal_plan_Meal Plan 3 0.00000
# level of importance of the predictor variables
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
To pre prune our tree, we will use two different methods
model1 = DecisionTreeClassifier(random_state=1, max_depth=5, class_weight="balanced")
model1.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=5, random_state=1)
confusion_matrix_sklearn(model1, X_train, y_train)
decision_tree_pretune1_perf_train = model_performance_classification_sklearn(
model1, X_train, y_train
)
decision_tree_pretune1_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.82239 | 0.77317 | 0.71219 | 0.74143 |
confusion_matrix_sklearn(model1, X_test, y_test)
decision_tree_pretune1_perf_test = model_performance_classification_sklearn(
model1, X_test, y_test
)
decision_tree_pretune1_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.82395 | 0.77030 | 0.71021 | 0.73904 |
plt.figure(figsize=(20, 15))
out = tree.plot_tree(
model1,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(model1, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- weights: [1737.14, 157.88] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- weights: [1090.00, 384.08] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- weights: [297.48, 513.12] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- weights: [315.37, 130.56] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- weights: [456.28, 132.08] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- weights: [352.65, 365.87] class: 1 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | |--- weights: [1009.48, 3702.68] class: 1 | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | |--- weights: [48.46, 1.52] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- weights: [712.75, 18.22] class: 0 | | | | |--- lead_time > 102.50 | | | | | |--- weights: [76.79, 22.77] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- weights: [756.73, 107.79] class: 0 | | | | |--- lead_time > 8.50 | | | | | |--- weights: [2646.71, 1452.84] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1585.04, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- weights: [232.61, 57.69] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- weights: [291.51, 162.44] class: 0 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [67.10, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- weights: [261.69, 86.53] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- weights: [9.69, 100.20] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- weights: [216.96, 667.97] class: 1 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- weights: [29.08, 1030.80] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- weights: [44.73, 12.14] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- weights: [29.08, 212.54] class: 1 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- weights: [250.51, 145.74] class: 0 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- weights: [112.58, 7.59] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | |--- weights: [0.00, 3147.05] class: 1 | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | |--- weights: [0.00, 53.13] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.11, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- weights: [26.09, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [3.73, 22.77] class: 1
# Printing the Gini importance of the predictor variables
print(
pd.DataFrame(
model1.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.48716 market_segment_type_Online 0.19926 no_of_special_requests 0.17932 avg_price_per_room 0.05902 no_of_adults 0.02737 no_of_weekend_nights 0.02222 required_car_parking_space_Yes 0.00960 arrival_month 0.00902 no_of_week_nights 0.00330 market_segment_type_Offline 0.00291 arrival_date 0.00082 room_type_reserved_Room_Type 2 0.00000 room_type_reserved_Room_Type 6 0.00000 market_segment_type_Corporate 0.00000 room_type_reserved_Room_Type 7 0.00000 market_segment_type_Complementary 0.00000 room_type_reserved_Room_Type 4 0.00000 room_type_reserved_Room_Type 5 0.00000 type_of_meal_plan_Meal Plan 3 0.00000 room_type_reserved_Room_Type 3 0.00000 type_of_meal_plan_Not Selected 0.00000 no_of_children 0.00000 type_of_meal_plan_Meal Plan 2 0.00000 no_of_previous_bookings_not_canceled 0.00000 no_of_previous_cancellations 0.00000 arrival_year 0.00000 repeated_guest_Yes 0.00000
# level of importance of the predictor variables
importances = model1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
According to this decision tree model, the three most important variables for predicting whether a booking will be cancelled or not in order of importance are:
The model performs creditably well on both the Train and Test data
But lets see if we can improve the model further by using another method to pre prune the tree.
Using GridSearch for Hyperparameter tuning of our tree model
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# Grid of parameters to choose from
parameters = {
"criterion": ["entropy", "gini"],
"max_depth": np.arange(2, 11, 2),
"max_leaf_nodes": [50, 75, 150, 250],
"min_samples_split": [10, 30, 50, 70],
}
# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(f1_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', max_depth=10,
max_leaf_nodes=150, min_samples_split=10,
random_state=1)
confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_pretune2_perf_train = model_performance_classification_sklearn(
estimator, X_train, y_train
)
decision_tree_pretune2_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.87295 | 0.84132 | 0.78747 | 0.81350 |
confusion_matrix_sklearn(estimator, X_test, y_test)
decision_tree_pretune2_perf_test = model_performance_classification_sklearn(
estimator, X_test, y_test
)
decision_tree_pretune2_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.86116 | 0.82567 | 0.76426 | 0.79378 |
plt.figure(figsize=(20, 15))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 196.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- avg_price_per_room <= 68.50 | | | | | | | | | |--- weights: [207.26, 10.63] class: 0 | | | | | | | | |--- avg_price_per_room > 68.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- weights: [206.52, 47.06] class: 0 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- weights: [2.24, 7.59] class: 1 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [99.16, 54.65] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [21.62, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 12.14] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1199.59, 0.00] class: 0 | | | | | |--- avg_price_per_room > 196.50 | | | | | | |--- weights: [0.75, 25.81] class: 1 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- avg_price_per_room <= 63.29 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [41.75, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.75, 3.04] class: 1 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | |--- weights: [16.40, 13.66] class: 0 | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | |--- weights: [4.47, 59.21] class: 1 | | | | | | | |--- avg_price_per_room > 63.29 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- weights: [455.53, 86.53] class: 0 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- weights: [27.59, 18.22] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.75, 15.18] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- weights: [413.04, 27.33] class: 0 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [15.66, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- weights: [11.93, 25.81] class: 1 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- weights: [76.79, 12.14] class: 0 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 132.43 | | | | | | | | | |--- weights: [9.69, 122.97] class: 1 | | | | | | | | |--- avg_price_per_room > 132.43 | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- no_of_previous_cancellations <= 0.50 | | | | | | | | | | |--- weights: [14.91, 154.85] class: 1 | | | | | | | | | |--- no_of_previous_cancellations > 0.50 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [60.39, 15.18] class: 0 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- weights: [59.64, 3.04] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- weights: [1.49, 16.70] class: 1 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [11.18, 19.74] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [56.66, 18.22] class: 0 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [16.40, 39.47] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [20.13, 6.07] class: 0 | | | | | | |--- arrival_date > 11.50 | | | | | | | |--- avg_price_per_room <= 102.09 | | | | | | | | |--- weights: [5.22, 144.22] class: 1 | | | | | | | |--- avg_price_per_room > 102.09 | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [0.75, 16.70] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [33.55, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | |--- weights: [6.71, 78.94] class: 1 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- weights: [38.02, 0.00] class: 0 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | |--- avg_price_per_room <= 65.38 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | |--- avg_price_per_room > 65.38 | | | | | | | | | |--- weights: [24.60, 3.04] class: 0 | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | |--- weights: [14.91, 72.87] class: 1 | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | |--- weights: [9.69, 1.52] class: 0 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [84.25, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- avg_price_per_room <= 90.85 | | | | | | | | | |--- avg_price_per_room <= 87.50 | | | | | | | | | | |--- weights: [13.42, 13.66] class: 1 | | | | | | | | | |--- avg_price_per_room > 87.50 | | | | | | | | | | |--- weights: [0.00, 15.18] class: 1 | | | | | | | | |--- avg_price_per_room > 90.85 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- weights: [120.03, 19.74] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [92.45, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 70.05 | | | | | | | | | |--- weights: [31.31, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 70.05 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- weights: [110.34, 40.99] class: 0 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- weights: [41.01, 40.99] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [0.00, 19.74] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [14.91, 13.66] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- weights: [155.07, 6.07] class: 0 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [3.73, 10.63] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 202.67 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- weights: [219.19, 56.17] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | |--- avg_price_per_room > 202.67 | | | | | | | |--- weights: [0.75, 22.77] class: 1 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [61.14, 232.27] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [26.09, 1.52] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [9.69, 36.43] class: 1 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- weights: [20.13, 10.63] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [15.66, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | |--- avg_price_per_room <= 71.92 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- lead_time <= 84.50 | | | | | | | | |--- weights: [50.70, 7.59] class: 0 | | | | | | | |--- lead_time > 84.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [6.71, 15.18] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- weights: [20.88, 6.07] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 71.34 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- weights: [27.59, 97.16] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [31.31, 33.40] class: 1 | | | | | | | | |--- avg_price_per_room > 71.34 | | | | | | | | | |--- weights: [11.18, 0.00] class: 0 | | | | | |--- avg_price_per_room > 71.92 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- avg_price_per_room <= 120.45 | | | | | | | | | |--- weights: [79.77, 9.11] class: 0 | | | | | | | | |--- avg_price_per_room > 120.45 | | | | | | | | | |--- weights: [7.46, 12.14] class: 1 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- weights: [20.13, 47.06] class: 1 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.00, 63.76] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- avg_price_per_room <= 104.31 | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [55.17, 118.41] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [23.11, 0.00] class: 0 | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [212.48, 599.66] class: 1 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [73.81, 411.41] class: 1 | | | | | | | |--- avg_price_per_room > 104.31 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- weights: [326.55, 2169.39] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- weights: [11.93, 15.18] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | | |--- weights: [27.59, 91.09] class: 1 | | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | | |--- weights: [12.67, 6.07] class: 0 | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | |--- weights: [48.46, 1.52] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | |--- weights: [697.09, 9.11] class: 0 | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | |--- lead_time <= 63.00 | | | | | | | |--- weights: [15.66, 1.52] class: 0 | | | | | | |--- lead_time > 63.00 | | | | | | | |--- weights: [0.00, 7.59] class: 1 | | | | |--- lead_time > 102.50 | | | | | |--- no_of_week_nights <= 2.50 | | | | | | |--- weights: [32.06, 19.74] class: 0 | | | | | |--- no_of_week_nights > 2.50 | | | | | | |--- weights: [44.73, 3.04] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_week_nights <= 10.00 | | | | | | | |--- weights: [498.03, 40.99] class: 0 | | | | | | |--- no_of_week_nights > 10.00 | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- weights: [258.71, 63.76] class: 0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 61.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [70.08, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- weights: [776.12, 335.50] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [126.74, 1.52] class: 0 | | | | | | | |--- lead_time > 61.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [4.47, 57.69] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [42.50, 54.65] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [544.25, 218.61] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [136.44, 127.52] class: 0 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | |--- weights: [303.44, 132.08] class: 0 | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- weights: [164.02, 156.37] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [11.93, 10.63] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [37.28, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [244.54, 332.47] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [50.70, 18.22] class: 0 | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | |--- weights: [134.20, 1.52] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1585.04, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- weights: [180.42, 57.69] class: 0 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [52.19, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- avg_price_per_room <= 202.95 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [9.69, 12.14] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- weights: [175.20, 28.84] class: 0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | |--- avg_price_per_room > 202.95 | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- weights: [106.61, 106.27] class: 0 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [67.10, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_month <= 5.00 | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | |--- arrival_month > 5.00 | | | | | | | |--- weights: [0.75, 24.29] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [46.97, 9.11] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- weights: [2.24, 13.66] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- weights: [195.33, 12.14] class: 0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- weights: [13.42, 27.33] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- weights: [8.95, 3.04] class: 0 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [0.75, 97.16] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- weights: [2.98, 282.37] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [4.47, 57.69] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [17.89, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [11.18, 15.18] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [75.30, 12.14] class: 0 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [25.35, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [26.09, 300.59] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [46.22, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- weights: [23.86, 1030.80] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [5.22, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- weights: [44.73, 12.14] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- weights: [12.67, 6.07] class: 0 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- weights: [7.46, 206.46] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | |--- weights: [46.97, 4.55] class: 0 | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- weights: [156.57, 62.24] class: 0 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [2.24, 15.18] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [20.88, 15.18] class: 0 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- weights: [4.47, 13.66] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [19.38, 34.92] class: 1 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- weights: [112.58, 7.59] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 3200.19] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.11, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [35.04, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [3.73, 22.77] class: 1
# Printing the Gini importance of the predictor variables
print(
pd.DataFrame(
estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.41477 market_segment_type_Online 0.15245 no_of_special_requests 0.13804 avg_price_per_room 0.11044 arrival_month 0.04730 no_of_weekend_nights 0.02479 no_of_adults 0.02462 no_of_week_nights 0.02291 arrival_year 0.02029 arrival_date 0.01578 market_segment_type_Offline 0.01247 required_car_parking_space_Yes 0.01151 type_of_meal_plan_Not Selected 0.00229 no_of_previous_cancellations 0.00093 type_of_meal_plan_Meal Plan 2 0.00075 room_type_reserved_Room_Type 5 0.00066 room_type_reserved_Room_Type 6 0.00000 market_segment_type_Corporate 0.00000 market_segment_type_Complementary 0.00000 room_type_reserved_Room_Type 7 0.00000 type_of_meal_plan_Meal Plan 3 0.00000 room_type_reserved_Room_Type 4 0.00000 room_type_reserved_Room_Type 3 0.00000 room_type_reserved_Room_Type 2 0.00000 no_of_children 0.00000 no_of_previous_bookings_not_canceled 0.00000 repeated_guest_Yes 0.00000
# level of importance of the predictor variables
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
According to this decision tree model, the four most important variables for predicting whether a booking will be cancelled or not in order of importance are:
We shall now try a Post Pruning method to see if the decision tree could be further improved
clf = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.00000 | 0.00838 |
| 1 | 0.00000 | 0.00838 |
| 2 | 0.00000 | 0.00838 |
| 3 | 0.00000 | 0.00838 |
| 4 | 0.00000 | 0.00838 |
| ... | ... | ... |
| 1833 | 0.00890 | 0.32806 |
| 1834 | 0.00980 | 0.33786 |
| 1835 | 0.01272 | 0.35058 |
| 1836 | 0.03412 | 0.41882 |
| 1837 | 0.08118 | 0.50000 |
1838 rows × 2 columns
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Next, we train a decision tree using the effective alphas. The last value
in ccp_alphas is the alpha value that prunes the whole tree,
leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.08117914389137176
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
f1_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = f1_score(y_train, pred_train)
f1_train.append(values_train)
f1_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = f1_score(y_test, pred_test)
f1_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1 Score")
ax.set_title("F1 Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00012267633155167032,
class_weight='balanced', random_state=1)
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_post_perf_train = model_performance_classification_sklearn(
best_model, X_train, y_train
)
decision_tree_post_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.90005 | 0.90350 | 0.81361 | 0.85620 |
confusion_matrix_sklearn(best_model, X_test, y_test)
decision_tree_post_perf_test = model_performance_classification_sklearn(
best_model, X_test, y_test
)
decision_tree_post_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.86879 | 0.85576 | 0.76614 | 0.80848 |
plt.figure(figsize=(15, 12))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 196.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- avg_price_per_room <= 68.50 | | | | | | | | | |--- weights: [207.26, 10.63] class: 0 | | | | | | | | |--- avg_price_per_room > 68.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- weights: [2.24, 7.59] class: 1 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- repeated_guest_Yes <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- repeated_guest_Yes > 0.50 | | | | | | | | | | | |--- weights: [11.18, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [21.62, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 12.14] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [1199.59, 0.00] class: 0 | | | | | |--- avg_price_per_room > 196.50 | | | | | | |--- weights: [0.75, 25.81] class: 1 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- avg_price_per_room <= 63.29 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [41.75, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.75, 3.04] class: 1 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- weights: [1.49, 12.14] class: 1 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- weights: [14.91, 1.52] class: 0 | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | |--- lead_time <= 44.00 | | | | | | | | | | | |--- weights: [0.75, 59.21] class: 1 | | | | | | | | | | |--- lead_time > 44.00 | | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 63.29 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [20.13, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.75, 15.18] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- weights: [413.04, 27.33] class: 0 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [15.66, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- weights: [55.17, 3.04] class: 0 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- lead_time <= 73.50 | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | | |--- lead_time > 73.50 | | | | | | | | | | |--- weights: [21.62, 4.55] class: 0 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 132.43 | | | | | | | | | |--- weights: [9.69, 122.97] class: 1 | | | | | | | | |--- avg_price_per_room > 132.43 | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 1.00 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- weights: [2.24, 118.41] class: 1 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_previous_bookings_not_canceled > 1.00 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | |--- weights: [31.31, 0.00] class: 0 | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [23.11, 6.07] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [5.96, 9.11] class: 1 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- weights: [59.64, 3.04] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- weights: [1.49, 16.70] class: 1 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 86.00 | | | | | | | | | | | |--- weights: [2.24, 16.70] class: 1 | | | | | | | | | | |--- avg_price_per_room > 86.00 | | | | | | | | | | | |--- weights: [8.95, 3.04] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- weights: [44.73, 4.55] class: 0 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- weights: [16.40, 39.47] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [20.13, 6.07] class: 0 | | | | | | |--- arrival_date > 11.50 | | | | | | | |--- avg_price_per_room <= 102.09 | | | | | | | | |--- weights: [5.22, 144.22] class: 1 | | | | | | | |--- avg_price_per_room > 102.09 | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [0.75, 16.70] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [33.55, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | |--- avg_price_per_room <= 124.25 | | | | | | | | | | |--- weights: [2.98, 75.91] class: 1 | | | | | | | | | |--- avg_price_per_room > 124.25 | | | | | | | | | | |--- weights: [3.73, 3.04] class: 0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_week_nights <= 1.50 | | | | | | |--- arrival_date <= 7.50 | | | | | | | |--- weights: [38.02, 0.00] class: 0 | | | | | | |--- arrival_date > 7.50 | | | | | | | |--- avg_price_per_room <= 93.58 | | | | | | | | |--- avg_price_per_room <= 65.38 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | |--- avg_price_per_room > 65.38 | | | | | | | | | |--- weights: [24.60, 3.04] class: 0 | | | | | | | |--- avg_price_per_room > 93.58 | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | |--- weights: [14.91, 72.87] class: 1 | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | |--- weights: [9.69, 1.52] class: 0 | | | | | |--- no_of_week_nights > 1.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- weights: [84.25, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- avg_price_per_room <= 90.85 | | | | | | | | | |--- avg_price_per_room <= 87.50 | | | | | | | | | | |--- weights: [13.42, 13.66] class: 1 | | | | | | | | | |--- avg_price_per_room > 87.50 | | | | | | | | | | |--- weights: [0.00, 15.18] class: 1 | | | | | | | | |--- avg_price_per_room > 90.85 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- weights: [58.15, 18.22] class: 0 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- weights: [61.88, 1.52] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [92.45, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 70.05 | | | | | | | | | |--- weights: [31.31, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 70.05 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [38.77, 1.52] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- weights: [34.30, 40.99] class: 1 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [0.00, 19.74] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 74.21 | | | | | | | | | | | |--- weights: [0.75, 3.04] class: 1 | | | | | | | | | | |--- avg_price_per_room > 74.21 | | | | | | | | | | | |--- weights: [9.69, 0.00] class: 0 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- weights: [4.47, 10.63] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- weights: [155.07, 6.07] class: 0 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [3.73, 10.63] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 202.67 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- weights: [63.37, 30.36] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | |--- weights: [115.56, 12.14] class: 0 | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- weights: [28.33, 3.04] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | |--- avg_price_per_room > 202.67 | | | | | | | |--- weights: [0.75, 22.77] class: 1 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | |--- avg_price_per_room <= 118.50 | | | | | | | | | |--- weights: [18.64, 59.21] class: 1 | | | | | | | | |--- avg_price_per_room > 118.50 | | | | | | | | | |--- weights: [8.20, 1.52] class: 0 | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | |--- weights: [34.30, 171.55] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [26.09, 1.52] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [9.69, 36.43] class: 1 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- avg_price_per_room <= 208.67 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 208.67 | | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [15.66, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | |--- avg_price_per_room <= 71.92 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- lead_time <= 84.50 | | | | | | | | |--- weights: [50.70, 7.59] class: 0 | | | | | | | |--- lead_time > 84.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | |--- lead_time <= 131.50 | | | | | | | | | | | |--- weights: [0.75, 15.18] class: 1 | | | | | | | | | | |--- lead_time > 131.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- weights: [20.88, 6.07] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 71.34 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | | |--- weights: [15.66, 78.94] class: 1 | | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- weights: [12.67, 3.04] class: 0 | | | | | | | | |--- avg_price_per_room > 71.34 | | | | | | | | | |--- weights: [11.18, 0.00] class: 0 | | | | | |--- avg_price_per_room > 71.92 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- avg_price_per_room <= 120.45 | | | | | | | | | |--- weights: [79.77, 9.11] class: 0 | | | | | | | | |--- avg_price_per_room > 120.45 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [3.73, 12.14] class: 1 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- weights: [16.40, 47.06] class: 1 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.00, 63.76] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- avg_price_per_room <= 104.31 | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [16.40, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- weights: [38.77, 118.41] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [23.11, 0.00] class: 0 | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [39.51, 185.21] class: 1 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [73.81, 411.41] class: 1 | | | | | | | |--- avg_price_per_room > 104.31 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | | | | |--- weights: [0.75, 138.15] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- weights: [11.18, 6.07] class: 0 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [0.75, 9.11] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | | |--- lead_time <= 22.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 22.00 | | | | | | | | | | | |--- weights: [17.15, 83.50] class: 1 | | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | | |--- weights: [12.67, 6.07] class: 0 | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | |--- weights: [48.46, 1.52] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | |--- weights: [697.09, 9.11] class: 0 | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | |--- lead_time <= 63.00 | | | | | | | |--- weights: [15.66, 1.52] class: 0 | | | | | | |--- lead_time > 63.00 | | | | | | | |--- weights: [0.00, 7.59] class: 1 | | | | |--- lead_time > 102.50 | | | | | |--- no_of_week_nights <= 2.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [31.31, 13.66] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [0.75, 6.07] class: 1 | | | | | |--- no_of_week_nights > 2.50 | | | | | | |--- weights: [44.73, 3.04] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_week_nights <= 10.00 | | | | | | | |--- weights: [498.03, 40.99] class: 0 | | | | | | |--- no_of_week_nights > 10.00 | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 13.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- weights: [58.90, 36.43] class: 0 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- weights: [33.55, 1.52] class: 0 | | | | | | |--- arrival_date > 13.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- weights: [123.76, 9.11] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 126.33 | | | | | | | | | |--- weights: [32.80, 3.04] class: 0 | | | | | | | | |--- avg_price_per_room > 126.33 | | | | | | | | | |--- weights: [9.69, 13.66] class: 1 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space_Yes <= 0.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 61.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [70.08, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [126.74, 1.52] class: 0 | | | | | | | |--- lead_time > 61.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [4.47, 57.69] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 71.93 | | | | | | | | | | | |--- weights: [54.43, 3.04] class: 0 | | | | | | | | | | |--- avg_price_per_room > 71.93 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 121.20 | | | | | | | | | | | |--- weights: [18.64, 6.07] class: 0 | | | | | | | | | | |--- avg_price_per_room > 121.20 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 55.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 55.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [11.93, 10.63] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [37.28, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 119.20 | | | | | | | | | | | |--- weights: [9.69, 28.84] class: 1 | | | | | | | | | | |--- avg_price_per_room > 119.20 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [49.95, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- weights: [0.75, 18.22] class: 1 | | | | | |--- required_car_parking_space_Yes > 0.50 | | | | | | |--- weights: [134.20, 1.52] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [1585.04, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | |--- lead_time <= 6.50 | | | | | | | | |--- weights: [32.06, 0.00] class: 0 | | | | | | | |--- lead_time > 6.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | |--- weights: [23.11, 1.52] class: 0 | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 93.09 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 93.09 | | | | | | | | | | | |--- weights: [77.54, 27.33] class: 0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [19.38, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [52.19, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- avg_price_per_room <= 202.95 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [1.49, 9.11] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- weights: [8.20, 3.04] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- weights: [175.20, 28.84] class: 0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | |--- avg_price_per_room > 202.95 | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- weights: [12.67, 7.59] class: 0 | | | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | | | |--- weights: [64.12, 60.72] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | |--- weights: [12.67, 3.04] class: 0 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [67.10, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_month <= 5.00 | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | |--- arrival_month > 5.00 | | | | | | | |--- weights: [0.75, 24.29] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [46.97, 9.11] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | |--- weights: [0.00, 13.66] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- weights: [188.62, 7.59] class: 0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- weights: [13.42, 27.33] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [8.20, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- weights: [0.75, 3.04] class: 1 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- weights: [0.75, 97.16] class: 1 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- weights: [2.98, 282.37] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | | |--- weights: [2.24, 57.69] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [17.89, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [11.18, 3.04] class: 0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [0.00, 12.14] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [75.30, 12.14] class: 0 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [25.35, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [11.18, 264.15] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [46.22, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- lead_time <= 324.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- weights: [7.46, 986.78] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | |--- weights: [0.00, 19.74] class: 1 | | | | | | |--- lead_time > 324.50 | | | | | | | |--- avg_price_per_room <= 89.00 | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 89.00 | | | | | | | | |--- weights: [0.75, 13.66] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [5.22, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [1.49, 7.59] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- arrival_date <= 1.50 | | | | | | | |--- weights: [1.49, 3.04] class: 1 | | | | | | |--- arrival_date > 1.50 | | | | | | | |--- weights: [35.79, 1.52] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- avg_price_per_room <= 96.37 | | | | | | | | |--- weights: [12.67, 3.04] class: 0 | | | | | | | |--- avg_price_per_room > 96.37 | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- weights: [7.46, 206.46] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | |--- weights: [46.97, 4.55] class: 0 | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- lead_time <= 233.00 | | | | | | | | | | |--- lead_time <= 152.50 | | | | | | | | | | | |--- weights: [1.49, 4.55] class: 1 | | | | | | | | | | |--- lead_time > 152.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 233.00 | | | | | | | | | | |--- weights: [23.11, 19.74] class: 0 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [2.24, 15.18] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- weights: [4.47, 13.66] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 14.50 | | | | | | | |--- weights: [8.20, 3.04] class: 0 | | | | | | |--- arrival_date > 14.50 | | | | | | | |--- weights: [11.18, 31.88] class: 1 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- lead_time <= 348.50 | | | | | | |--- weights: [106.61, 3.04] class: 0 | | | | | |--- lead_time > 348.50 | | | | | | |--- weights: [5.96, 4.55] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 3200.19] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.11, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [35.04, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [3.73, 22.77] class: 1
# Printing the Gini importance of the predictor variables
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.39285 market_segment_type_Online 0.13423 avg_price_per_room 0.12439 no_of_special_requests 0.12071 arrival_month 0.05955 arrival_date 0.03479 no_of_week_nights 0.02889 no_of_weekend_nights 0.02606 no_of_adults 0.02509 arrival_year 0.01892 market_segment_type_Offline 0.01161 required_car_parking_space_Yes 0.01006 type_of_meal_plan_Not Selected 0.00546 room_type_reserved_Room_Type 4 0.00190 room_type_reserved_Room_Type 6 0.00096 no_of_previous_bookings_not_canceled 0.00082 room_type_reserved_Room_Type 2 0.00081 market_segment_type_Corporate 0.00077 type_of_meal_plan_Meal Plan 2 0.00066 room_type_reserved_Room_Type 5 0.00058 no_of_children 0.00056 repeated_guest_Yes 0.00034 market_segment_type_Complementary 0.00000 room_type_reserved_Room_Type 7 0.00000 room_type_reserved_Room_Type 3 0.00000 no_of_previous_cancellations 0.00000 type_of_meal_plan_Meal Plan 3 0.00000
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
According to this decision tree model, the four most important variables for predicting whether a booking will be cancelled or not in order of importance are:
# training set performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train.T,
decision_tree_pretune1_perf_train.T,
decision_tree_pretune2_perf_train.T,
decision_tree_post_perf_train.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning1)",
"Decision Tree (Pre-Pruning2)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning1) | Decision Tree (Pre-Pruning2) | Decision Tree (Post-Pruning) | |
|---|---|---|---|---|
| Accuracy | 0.99421 | 0.82239 | 0.87295 | 0.90005 |
| Recall | 0.98661 | 0.77317 | 0.84132 | 0.90350 |
| Precision | 0.99578 | 0.71219 | 0.78747 | 0.81361 |
| F1 | 0.99117 | 0.74143 | 0.81350 | 0.85620 |
# test set performance comparison
models_test_comp_df = pd.concat(
[
decision_tree_perf_test.T,
decision_tree_pretune1_perf_test.T,
decision_tree_pretune2_perf_test.T,
decision_tree_post_perf_test.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning1)",
"Decision Tree (Pre-Pruning2)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_test_comp_df
Training performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning1) | Decision Tree (Pre-Pruning2) | Decision Tree (Post-Pruning) | |
|---|---|---|---|---|
| Accuracy | 0.87090 | 0.82395 | 0.86116 | 0.86879 |
| Recall | 0.81034 | 0.77030 | 0.82567 | 0.85576 |
| Precision | 0.79476 | 0.71021 | 0.76426 | 0.76614 |
| F1 | 0.80247 | 0.73904 | 0.79378 | 0.80848 |
Lead_time, market_segment_online, avg_price_per_room and no_of_special_requests are the four most important predictors of whether a booking will be cancelled or not.
Repeated guest and guests who make special requests ane less likely to cancel their bookings.
93% (33742 out of 36275) of bookings comes from the online and offline market segment
94% (34187 out of 36275) of the bookings are for room type1 and type4.
Meal plan 3 is rarely chosen by the guests (5 out of 36275 times). Might as well discontinue it and replace it with something else.
The months of August, september and October are the busiest for the hotel
Repeat guests rarely cancel but the percentage of repeat guests is rather low 2.6% (930 out of 36275) of total bookings. The hotel management should introduce a loyalty package with attractive incentives to get this number up.
The longer the time between booking and arrival (lead_time), the more likely is the booking to be cancelled. To minimise incidents of cancellation, management is advised to make room rates booked into the future more attractive (like lower prices) but subject to a non-refundable deposit when booking with a lead time greater than 60 days (median lead_time is 57 days).
Management should look for ways of taking advantage of the higher patronage in the months of August to October with a view to maximising profit during these months. It may even be something as counter intuitive like a slight decrease in prices to see if it will drive up the occupancy rate further, discount for longer occupancy. It may also be an outright increase (small, nothing drastic) in the room prices. We may need to carry out further investigation on the data to see which of these will be most favourable. At the same time, we should not neglect the so called lean months. Incentives such as promos could help in driving up the occupany figures during these months also.